Extrinsic Evaluation of Sentence Alignment Systems
نویسندگان
چکیده
Parallel corpora are usually a collection of documents which are translations of each other. To be useful in NLP applications such as word alignment or machine translation, they first have to be aligned at the sentence level. This paper is a user study briefly reviewing several sentence aligners and evaluating them based on the performance achieved by the SMT systems trained on their output. We conducted experiments on two language pairs and showed that using a more advanced sentence alignment algorithm may yield gains of 0.5 to 1 BLEU points. Posted at the Zurich Open Repository and Archive, University of Zurich ZORA URL: https://doi.org/10.5167/uzh-62565 Accepted Version Originally published at: Abdul-Rauf, Sadaf; Fishel, Mark; Lambert, Patrik; Noubours, Sandra; Sennrich, Rico (2012). Extrinsic evaluation of sentence alignment systems. In: Workshop on Creating Cross-language Resources for Disconnected Languages and Styles, Istanbul, 27 May 2012 27 May 2012, 6-10. Extrinsic Evaluation of Sentence Alignment Systems Sadaf Abdul-Rauf, Mark Fishel, Patrik Lambert, Sandra Noubours, Rico Sennrich LIUM, University of Le Mans, France sadaf.abdul-rauf,[email protected] Institute of Computational Linguistics, University of Zurich, Switzerland sennrich,[email protected] Fraunhofer FKIE, Wachtberg, Germany [email protected]
منابع مشابه
Chapter 19 Evaluation of parallel text alignment systems The ARCADE project
This chapter describes the ARCADE project, concerned with the evaluation of parallel text alignment systems. The project is composed of two tracks, devoted to the evaluation of alignment at sentence and word level respectively, and is planned for a four-year period. At the time of this report, twelve systems have participated in the sentence track, and five in the word track. Substantial progre...
متن کاملSentence Alignment for Monolingual Comparable Corpora
We address the problem of sentence alignment for monolingual corpora, a phenomenon distinct from alignment in parallel corpora. Aligning large comparable corpora automatically would provide a valuable resource for learning of text-totext rewriting rules. We incorporate context into the search for an optimal alignment in two complementary ways: learning rules for matching paragraphs using topic ...
متن کاملIterative, MT-based Sentence Alignment of Parallel Texts
Recent research has shown that MT-based sentence alignment is a robust approach for noisy parallel texts. However, using Machine Translation for sentence alignment causes a chicken-and-egg problem: to train a corpus-based MT system, we need sentence-aligned data, and MT-based sentence alignment depends on an MT system. We describe a bootstrapping approach to sentence alignment that resolves thi...
متن کاملEvaluation of multilingual text alignment systems: the ARCADE II project
This paper describes the ARCADE II project, concerned with the evaluation of parallel text alignment systems. The ARCADE II project aims at exploring the techniques of multilingual text alignment through a fine evaluation of the existing techniques and the development of new alignment methods. The evaluation campaign consists of two tracks devoted to the evaluation of alignment at sentence and ...
متن کاملComparing Lexical Chain-based Summarisation Approaches Using an Extrinsic Evaluation
We present a comparative study of lexical chain-based summarisation techniques. The aim of this paper is to highlight the effect of lexical chain scoring metrics and sentence extraction techniques on summary generation. We present our own lexical chain-based summarisation system and compare it to other chainbased summarisation systems. We also compare the chain scoring and extraction techniques...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012